Keyword [StackGAN]
Zhang H, Xu T, Li H, et al. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks[C]//IEEE Int. Conf. Comput. Vision (ICCV). 2017: 5907-5915.
1. Overview
1.1. Motivation
- existing methods fail to contain details and vivid object parts
- instability of training GAN
- the limited number of training text-image pairs often results in sparsity in the text conditioning manifold and such sparsity makes it difficult to train GAN
In this paper, it proposed StackGAN
- decompose the hard problem into more manageable sub-problems
- stage I. sketch the primitive shape and colors, low-resolution
- stage II. details
- Conditioning Augmentation Technique. smoothness in the latent conditioning manifold
1.2. Contribution
- StackGAN
- Conditioning Augmentation (CA)
1.3. Related Work
1.3.1. Generative Model
- VAE
- Pixel RNN
- GAN
- energy-based GAN
1.3.2. Conditional Image Generation
- variable such as attributes or class label
- image-to-image. photo editing, domain transfer, SR
1.3.3. Series of GAN
2. StackGAN
2.1. Conditioning Augmentation
- latent space for text embedding usually high, limited amount of data causes discontinuity in the latent data manifold
- CA yields more training pairs, smoothness over conditioning manifold and avoid overfitting
2.2. Stage-I GAN
- set λ = 1
- I_0. real image
2.3. Stage-II GAN
- s_0. LR generated by stage-I
- two stages share the same text encoder and different CA
2.4. Details
- first train stage-I GAN, fix stage-II GAN
- then train stage-II GAN, fix stage-I GAN
0.0002 Adam decay 0.5, mini-batch 64
nearest-neighbour upsample
- dimension of z 100
3. Experiments
3.1. Dataset
- MSCOCO
- CUB
3.2. Metric
- Inception Score
- x. generated sample
- y. label predicted by Inception Model (fine-tune on Experiment dataset)
- Human Evaluation
3.3. Comparison
- GAN-INT-CLS. only reflect the general shape and color of the birds
- GAWWN. fail to generate plausible images
- stage-II GAN can correct the defects of stage-I
- even when stage-I fails to draw a plausible shape, shape-II can generate reasonable object
3.4. Ablation Study
- CA helps stabilize training and improve diversity of generated samples, because of its ability to encourage robustness to small perturbation along the latent manifold